Omg finally ready to analyse data! There’s still some ambiguities in the tissue assignments, but I think that’s as good as it’s going to get.

So first we want to check how many studies ended up mixed between race and geography and do some cleaning:

tabulatePops <- by(allSRAFinal, allSRAFinal$SRA.Study, function(x) table(x$finalGeography, x$finalRace)) 
conflictStudies <- names(tabulatePops[lapply(tabulatePops, function(x) grep("NULL", dimnames(x))) %>% grepl(0, .)])
length(conflictStudies) # That's a lot of messiness...
## [1] 36
length(unique(allSRAFinal$SRA.Study))
## [1] 263
by(allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,], allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,]$SRA.Study, function(x) table(x$finalGeography, x$finalRace))
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: DRP001797
##                                
##                                 Black or African American White
##   East Asia                                             0     0
##   North Africa and Western Asia                         0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: ERP116722
##                    
##                     Other
##   Subsaharan Africa     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: ERP117085
##           
##            Black or African American
##   Multiple                         0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP063355
##       
##        Black or African American White
##   Asia                         0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP070663
##       
##        Black or African American
##   Asia                         0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP072417
##             
##              Asian Black or African American White
##   South Asia     0                         0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP073813
##       
##        Black or African American Native Hawaiian or other Pacific Islander Other White
##   Asia                         0                                         0     0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP074739
##             
##              Black or African American White
##   South Asia                         0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP102952
##                 
##                  White
##   East Asia          0
##   South Asia         0
##   Southeast Asia     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP114762
##         
##          Black or African American
##   Europe                         0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP116913
##             
##              American Indian or Alaskan Native Black or African American Other
##   Europe                                     0                         0     0
##   South Asia                                 0                         0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP118614
##         
##          Black or African American
##   Europe                         0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP125882
##           
##            Black or African American White
##   Americas                         0     0
##   Asia                             0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP155483
##                    
##                     White
##   East Asia             0
##   South Asia            0
##   Southeast Asia        0
##   Subsaharan Africa     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP172694
##                    
##                     Multiple White
##   Subsaharan Africa        0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP188296
##                                
##                                 American Indian or Alaskan Native Asian Black or African American White
##   North Africa and Western Asia                                 0     0                         0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP190479
##                                
##                                 Native Hawaiian or other Pacific Islander White
##   Asia                                                                  0     0
##   North Africa and Western Asia                                         0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212343
##                                
##                                 Native Hawaiian or other Pacific Islander White
##   Asia                                                                  0     0
##   North Africa and Western Asia                                         0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212369
##                                
##                                 Native Hawaiian or other Pacific Islander White
##   Asia                                                                  0     0
##   North Africa and Western Asia                                         0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212370
##                                
##                                 Native Hawaiian or other Pacific Islander White
##   Asia                                                                  0     0
##   North Africa and Western Asia                                         0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP216558
##             
##              White
##   Asia           0
##   South Asia     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP219483
##             
##              Black or African American White
##   South Asia                         0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP221484
##           
##            Black or African American Multiple
##   Asia                             0        0
##   Europe                           0        0
##   Multiple                         0        0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP226691
##            
##             White
##   East Asia     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP245400
##       
##        American Indian or Alaskan Native Black or African American White
##   Asia                                 0                         0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP251118
##         
##          Black or African American
##   Europe                         0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP268711
##             
##              Black or African American White
##   South Asia                         0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP274641
##                    
##                     Multiple White
##   Subsaharan Africa        0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP283115
##       
##        Black or African American Multiple White
##   Asia                         0        0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP300191
##                    
##                     Asian Black or African American White
##   Subsaharan Africa     0                         0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP303641
##                                
##                                 White
##   Asia                              0
##   North Africa and Western Asia     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP303646
##             
##              Asian Black or African American White
##   South Asia     0                         0     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP324614
##         
##          Black or African American
##   Europe                         0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP363798
##                    
##                     White
##   Americas              0
##   Asia                  0
##   East Asia             0
##   South Asia            0
##   Subsaharan Africa     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP377781
##           
##            White
##   Asia         0
##   Multiple     0
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP388678
##                                
##                                 Black or African American Other White
##   Asia                                                  0     0     0
##   Europe                                                0     0     0
##   Multiple                                              0     0     0
##   North Africa and Western Asia                         0     0     0
tabulateTerms <- by(allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,], allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,]$SRA.Study, function(x) table(is.na(x$finalGeography), is.na(x$finalRace))) # So there's often a clear skew towards one category, which should make solving these easier...

tabulateTerms <- data.frame(melt(unlist(tabulateTerms)))
tabulateTerms$condition <- rep(c("both", "race", "geography", "neither"), nrow(tabulateTerms)/4)
tabulateTerms$SRA.Study <- str_sub(rownames(tabulateTerms), end=-2)

ggplot(tabulateTerms, aes(x = SRA.Study, y = value, fill=condition)) +
  geom_bar(stat="identity") +
  theme_bw() +
  ggtitle("Race or Ethnicity usage") +
  # xlab("finalSystem") +
  ylab("Count") +
  scale_fill_brewer(palette = "Set1") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position="bottom")

From this it’s very clear what we should do for most studies! The vast majority of them should use only racial terms and (I’m guessing), swap Asian over to a racial descriptor. But anyhow, let’s manually spot check some of these:

columnsILike <- c(1,10, 15, 34:36) # Just need those two intermediate ones tbh

by(allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,], allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,]$SRA.Study, function(x) head(x[,columnsILike])) 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: DRP001797
##       SRA.Study ETHNICITY RACE finalRace                finalGeography hispanic
## 98154 DRP001797    Arabic <NA>      <NA> North Africa and Western Asia     <NA>
## 98176 DRP001797    Arabic <NA>      <NA> North Africa and Western Asia     <NA>
## 98300 DRP001797    Arabic <NA>      <NA> North Africa and Western Asia     <NA>
## 98394 DRP001797  Japanese <NA>      <NA>                     East Asia     <NA>
## 98440 DRP001797  Japanese <NA>      <NA>                     East Asia     <NA>
## 98443 DRP001797  Japanese <NA>      <NA>                     East Asia     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: ERP116722
##        SRA.Study ETHNICITY RACE finalRace    finalGeography hispanic
## 188456 ERP116722  Mandingo <NA>      <NA> Subsaharan Africa     <NA>
## 188457 ERP116722      Jola <NA>      <NA> Subsaharan Africa     <NA>
## 188459 ERP116722  Mandingo <NA>      <NA> Subsaharan Africa     <NA>
## 188460 ERP116722     Fulla <NA>      <NA> Subsaharan Africa     <NA>
## 188461 ERP116722     Fulla <NA>      <NA> Subsaharan Africa     <NA>
## 188463 ERP116722      Jola <NA>      <NA> Subsaharan Africa     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: ERP117085
##        SRA.Study      ETHNICITY RACE finalRace finalGeography hispanic
## 603830 ERP117085 mixed ancestry <NA>      <NA>       Multiple     <NA>
## 603831 ERP117085 mixed ancestry <NA>      <NA>       Multiple     <NA>
## 603832 ERP117085 mixed ancestry <NA>      <NA>       Multiple     <NA>
## 603834 ERP117085 mixed ancestry <NA>      <NA>       Multiple     <NA>
## 603835 ERP117085 mixed ancestry <NA>      <NA>       Multiple     <NA>
## 603836 ERP117085 mixed ancestry <NA>      <NA>       Multiple     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP063355
##       SRA.Study ETHNICITY RACE                 finalRace finalGeography hispanic
## 87235 SRP063355     asian <NA>                      <NA>           Asia     <NA>
## 87242 SRP063355     asian <NA>                      <NA>           Asia     <NA>
## 87226 SRP063355     black <NA> Black or African American           <NA>     <NA>
## 87227 SRP063355     black <NA> Black or African American           <NA>     <NA>
## 87228 SRP063355     white <NA>                     White           <NA>     <NA>
## 87229 SRP063355     white <NA>                     White           <NA>     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP070663
##        SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 293582 SRP070663     Asian <NA>      <NA>           Asia     <NA>
## 293583 SRP070663     Asian <NA>      <NA>           Asia     <NA>
## 293584 SRP070663     Asian <NA>      <NA>           Asia     <NA>
## 293585 SRP070663     Asian <NA>      <NA>           Asia     <NA>
## 293586 SRP070663     Asian <NA>      <NA>           Asia     <NA>
## 293587 SRP070663     Asian <NA>      <NA>           Asia     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP072417
##        SRA.Study ETHNICITY           RACE finalRace finalGeography hispanic
## 282324 SRP072417      <NA> White.Hispanic     White           <NA> hispanic
## 282325 SRP072417      <NA> White.Hispanic     White           <NA> hispanic
## 282326 SRP072417      <NA>          White     White           <NA>     <NA>
## 282327 SRP072417      <NA>          White     White           <NA>     <NA>
## 282328 SRP072417      <NA>          White     White           <NA>     <NA>
## 282329 SRP072417      <NA>          White     White           <NA>     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP073813
##        SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 139194 SRP073813     Asian <NA>      <NA>           Asia     <NA>
## 139219 SRP073813     Asian <NA>      <NA>           Asia     <NA>
## 139247 SRP073813     Asian <NA>      <NA>           Asia     <NA>
## 139420 SRP073813     Asian <NA>      <NA>           Asia     <NA>
## 139433 SRP073813     Asian <NA>      <NA>           Asia     <NA>
## 139527 SRP073813     Asian <NA>      <NA>           Asia     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP074739
##        SRA.Study    ETHNICITY RACE finalRace finalGeography hispanic
## 189038 SRP074739 South Indian <NA>      <NA>     South Asia     <NA>
## 189040 SRP074739 South Indian <NA>      <NA>     South Asia     <NA>
## 189041 SRP074739 South Indian <NA>      <NA>     South Asia     <NA>
## 189042 SRP074739 South Indian <NA>      <NA>     South Asia     <NA>
## 189043 SRP074739 South Indian <NA>      <NA>     South Asia     <NA>
## 189014 SRP074739    Caucasian <NA>     White           <NA>     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP102952
##        SRA.Study ETHNICITY      RACE finalRace finalGeography hispanic
## 119193 SRP102952      <NA> Caucasian     White           <NA>     <NA>
## 119195 SRP102952      <NA> Caucasian     White           <NA>     <NA>
## 119013 SRP102952      <NA>    Indian      <NA>     South Asia     <NA>
## 119015 SRP102952      <NA>    Indian      <NA>     South Asia     <NA>
## 119017 SRP102952      <NA>   Chinese      <NA>      East Asia     <NA>
## 119019 SRP102952      <NA>   Chinese      <NA>      East Asia     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP114762
##        SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 184554 SRP114762     White <NA>      <NA>         Europe     <NA>
## 184555 SRP114762     White <NA>      <NA>         Europe     <NA>
## 184556 SRP114762     White <NA>      <NA>         Europe     <NA>
## 184557 SRP114762     White <NA>      <NA>         Europe     <NA>
## 184558 SRP114762     White <NA>      <NA>         Europe     <NA>
## 184559 SRP114762     White <NA>      <NA>         Europe     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP116913
##        SRA.Study                             ETHNICITY RACE finalRace finalGeography hispanic
## 589614 SRP116913 White - caucasian / european heritage <NA>      <NA>         Europe     <NA>
## 617071 SRP116913 White - caucasian / european heritage <NA>      <NA>         Europe     <NA>
## 617073 SRP116913 White - caucasian / european heritage <NA>      <NA>         Europe     <NA>
## 617074 SRP116913 White - caucasian / european heritage <NA>      <NA>         Europe     <NA>
## 617076 SRP116913 White - caucasian / european heritage <NA>      <NA>         Europe     <NA>
## 617077 SRP116913 White - caucasian / european heritage <NA>      <NA>         Europe     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP118614
##        SRA.Study          ETHNICITY RACE finalRace finalGeography hispanic
## 248113 SRP118614 European Americans <NA>      <NA>         Europe     <NA>
## 248115 SRP118614 European Americans <NA>      <NA>         Europe     <NA>
## 248117 SRP118614 European Americans <NA>      <NA>         Europe     <NA>
## 248119 SRP118614 European Americans <NA>      <NA>         Europe     <NA>
## 248121 SRP118614 European Americans <NA>      <NA>         Europe     <NA>
## 248123 SRP118614 European Americans <NA>      <NA>         Europe     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP125882
##        SRA.Study ETHNICITY RACE                 finalRace finalGeography hispanic
## 225638 SRP125882  Oriental <NA>                      <NA>           Asia     <NA>
## 225640 SRP125882     Latin <NA>                      <NA>       Americas     <NA>
## 225613 SRP125882 Caucasian <NA>                     White           <NA>     <NA>
## 225614 SRP125882     Black <NA> Black or African American           <NA>     <NA>
## 225615 SRP125882     Black <NA> Black or African American           <NA>     <NA>
## 225616 SRP125882 Caucasian <NA>                     White           <NA>     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP155483
##        SRA.Study ETHNICITY RACE finalRace    finalGeography hispanic
## 211874 SRP155483      <NA> <NA>      <NA>         East Asia     <NA>
## 211875 SRP155483      <NA> <NA>      <NA>        South Asia     <NA>
## 211879 SRP155483      <NA> <NA>      <NA>         East Asia     <NA>
## 211882 SRP155483      <NA> <NA>      <NA>    Southeast Asia     <NA>
## 211934 SRP155483      <NA> <NA>      <NA>        South Asia     <NA>
## 211954 SRP155483      <NA> <NA>      <NA> Subsaharan Africa     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP172694
##        SRA.Study             ETHNICITY RACE finalRace    finalGeography hispanic
## 677192 SRP172694               African <NA>      <NA> Subsaharan Africa     <NA>
## 677196 SRP172694               African <NA>      <NA> Subsaharan Africa     <NA>
## 677186 SRP172694 Asian-Pacificlslander <NA>  Multiple              <NA>     <NA>
## 677188 SRP172694             Caucasian <NA>     White              <NA>     <NA>
## 677204 SRP172694 Asian-Pacificlslander <NA>  Multiple              <NA>     <NA>
## 677208 SRP172694              Hispanic <NA>      <NA>              <NA> hispanic
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP188296
##        SRA.Study ETHNICITY             RACE                         finalRace finalGeography hispanic
## 375085 SRP188296      <NA>        Caucasian                             White           <NA>     <NA>
## 375086 SRP188296      <NA>        Caucasian                             White           <NA>     <NA>
## 375087 SRP188296      <NA>  Native American American Indian or Alaskan Native           <NA>     <NA>
## 375088 SRP188296      <NA>            Asian                             Asian           <NA>     <NA>
## 375089 SRP188296      <NA>        Caucasian                             White           <NA>     <NA>
## 375090 SRP188296      <NA> African/American         Black or African American           <NA>     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP190479
##        SRA.Study      ETHNICITY RACE finalRace                finalGeography hispanic
## 596096 SRP190479          Asian <NA>      <NA>                          Asia     <NA>
## 596130 SRP190479 Middle Eastern <NA>      <NA> North Africa and Western Asia     <NA>
## 596077 SRP190479      Caucasian <NA>     White                          <NA>     <NA>
## 596078 SRP190479      Caucasian <NA>     White                          <NA>     <NA>
## 596079 SRP190479      Caucasian <NA>     White                          <NA>     <NA>
## 596080 SRP190479      Caucasian <NA>     White                          <NA>     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212343
##        SRA.Study      ETHNICITY RACE finalRace                finalGeography hispanic
## 596482 SRP212343          Asian <NA>      <NA>                          Asia     <NA>
## 596558 SRP212343 Middle Eastern <NA>      <NA> North Africa and Western Asia     <NA>
## 596478 SRP212343      Caucasian <NA>     White                          <NA>     <NA>
## 596479 SRP212343      Caucasian <NA>     White                          <NA>     <NA>
## 596480 SRP212343      Caucasian <NA>     White                          <NA>     <NA>
## 596481 SRP212343      Caucasian <NA>     White                          <NA>     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212369
##        SRA.Study      ETHNICITY RACE finalRace                finalGeography hispanic
## 596572 SRP212369 Middle Eastern <NA>      <NA> North Africa and Western Asia     <NA>
## 596590 SRP212369          Asian <NA>      <NA>                          Asia     <NA>
## 596565 SRP212369      Caucasian <NA>     White                          <NA>     <NA>
## 596566 SRP212369      Caucasian <NA>     White                          <NA>     <NA>
## 596567 SRP212369      Caucasian <NA>     White                          <NA>     <NA>
## 596568 SRP212369      Caucasian <NA>     White                          <NA>     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212370
##        SRA.Study      ETHNICITY RACE finalRace                finalGeography hispanic
## 596639 SRP212370 Middle Eastern <NA>      <NA> North Africa and Western Asia     <NA>
## 596676 SRP212370          Asian <NA>      <NA>                          Asia     <NA>
## 596625 SRP212370      Caucasian <NA>     White                          <NA>     <NA>
## 596626 SRP212370      Caucasian <NA>     White                          <NA>     <NA>
## 596627 SRP212370      Caucasian <NA>     White                          <NA>     <NA>
## 596628 SRP212370      Caucasian <NA>     White                          <NA>     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP216558
##        SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 685098 SRP216558     Asian <NA>      <NA>           Asia     <NA>
## 685100 SRP216558    Indian <NA>      <NA>     South Asia     <NA>
## 685101 SRP216558    Indian <NA>      <NA>     South Asia     <NA>
## 685102 SRP216558    Indian <NA>      <NA>     South Asia     <NA>
## 685103 SRP216558    Indian <NA>      <NA>     South Asia     <NA>
## 685104 SRP216558    Indian <NA>      <NA>     South Asia     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP219483
##        SRA.Study ETHNICITY      RACE                 finalRace finalGeography hispanic
## 660041 SRP219483      <NA>     black Black or African American           <NA>     <NA>
## 660042 SRP219483      <NA> caucasian                     White           <NA>     <NA>
## 660043 SRP219483      <NA> caucasian                     White           <NA>     <NA>
## 660046 SRP219483      <NA> caucasian                     White           <NA>     <NA>
## 660047 SRP219483      <NA> caucasian                     White           <NA>     <NA>
## 660048 SRP219483      <NA> caucasian                     White           <NA>     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP221484
##        SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 612274 SRP221484      <NA> <NA>      <NA>         Europe     <NA>
## 612277 SRP221484      <NA> <NA>      <NA>         Europe     <NA>
## 612279 SRP221484      <NA> <NA>      <NA>         Europe     <NA>
## 612286 SRP221484      <NA> <NA>      <NA>         Europe     <NA>
## 612287 SRP221484      <NA> <NA>      <NA>         Europe     <NA>
## 612288 SRP221484      <NA> <NA>      <NA>         Europe     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP226691
##        SRA.Study ETHNICITY      RACE finalRace finalGeography hispanic
## 562018 SRP226691      <NA> Caucasian     White           <NA>     <NA>
## 562022 SRP226691      <NA>  Japanese      <NA>      East Asia     <NA>
## 562026 SRP226691      <NA>  Japanese      <NA>      East Asia     <NA>
## 562030 SRP226691      <NA>  Japanese      <NA>      East Asia     <NA>
## 562034 SRP226691      <NA>  Japanese      <NA>      East Asia     <NA>
## 562038 SRP226691      <NA>  Japanese      <NA>      East Asia     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP245400
##        SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 556745 SRP245400     Asian <NA>      <NA>           Asia     <NA>
## 556803 SRP245400     Asian <NA>      <NA>           Asia     <NA>
## 556865 SRP245400     Asian <NA>      <NA>           Asia     <NA>
## 556870 SRP245400     Asian <NA>      <NA>           Asia     <NA>
## 556871 SRP245400     Asian <NA>      <NA>           Asia     <NA>
## 556872 SRP245400     Asian <NA>      <NA>           Asia     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP251118
##        SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 782356 SRP251118       EUR <NA>      <NA>         Europe     <NA>
## 782358 SRP251118       EUR <NA>      <NA>         Europe     <NA>
## 782359 SRP251118       EUR <NA>      <NA>         Europe     <NA>
## 782360 SRP251118       EUR <NA>      <NA>         Europe     <NA>
## 782367 SRP251118       EUR <NA>      <NA>         Europe     <NA>
## 782369 SRP251118       EUR <NA>      <NA>         Europe     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP268711
##        SRA.Study ETHNICITY                   RACE                 finalRace finalGeography hispanic
## 666767 SRP268711      <NA>              Caucasian                     White           <NA>     <NA>
## 666772 SRP268711      <NA>              Caucasian                     White           <NA>     <NA>
## 666774 SRP268711      <NA>              Caucasian                     White           <NA>     <NA>
## 666786 SRP268711      <NA> Black/African American Black or African American           <NA>     <NA>
## 666793 SRP268711      <NA> Black/African American Black or African American           <NA>     <NA>
## 666795 SRP268711      <NA> Black/African American Black or African American           <NA>     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP274641
##        SRA.Study             ETHNICITY RACE finalRace    finalGeography hispanic
## 860020 SRP274641               African <NA>      <NA> Subsaharan Africa     <NA>
## 860024 SRP274641               African <NA>      <NA> Subsaharan Africa     <NA>
## 860014 SRP274641 Asian-Pacificlslander <NA>  Multiple              <NA>     <NA>
## 860016 SRP274641             Caucasian <NA>     White              <NA>     <NA>
## 860032 SRP274641 Asian-Pacificlslander <NA>  Multiple              <NA>     <NA>
## 860036 SRP274641              Hispanic <NA>      <NA>              <NA> hispanic
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP283115
##        SRA.Study              ETHNICITY RACE                 finalRace finalGeography hispanic
## 859178 SRP283115                  Asian <NA>                      <NA>           Asia     <NA>
## 859179 SRP283115                  Asian <NA>                      <NA>           Asia     <NA>
## 859173 SRP283115              Caucasian <NA>                     White           <NA>     <NA>
## 859176 SRP283115        Hispanic/Latino <NA>                      <NA>           <NA> hispanic
## 859181 SRP283115              Caucasian <NA>                     White           <NA>     <NA>
## 859182 SRP283115 Black/African American <NA> Black or African American           <NA>     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP300191
##        SRA.Study ETHNICITY               RACE                 finalRace finalGeography hispanic
## 802400 SRP300191      <NA>   African American Black or African American           <NA>     <NA>
## 802401 SRP300191      <NA> Caucasian American                     White           <NA>     <NA>
## 802402 SRP300191      <NA> Caucasian American                     White           <NA>     <NA>
## 802403 SRP300191      <NA> Caucasian American                     White           <NA>     <NA>
## 802404 SRP300191      <NA>   African American Black or African American           <NA>     <NA>
## 802405 SRP300191      <NA>   African American Black or African American           <NA>     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP303641
##        SRA.Study     ETHNICITY RACE finalRace                finalGeography hispanic
## 845507 SRP303641         Asian <NA>      <NA>                          Asia     <NA>
## 845508 SRP303641         Asian <NA>      <NA>                          Asia     <NA>
## 845523 SRP303641 North african <NA>      <NA> North Africa and Western Asia     <NA>
## 845524 SRP303641 North african <NA>      <NA> North Africa and Western Asia     <NA>
## 845533 SRP303641         Asian <NA>      <NA>                          Asia     <NA>
## 845534 SRP303641         Asian <NA>      <NA>                          Asia     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP303646
##        SRA.Study ETHNICITY             RACE                 finalRace finalGeography hispanic
## 763903 SRP303646      <NA> African American Black or African American           <NA>     <NA>
## 763904 SRP303646      <NA>            White                     White           <NA>     <NA>
## 763905 SRP303646      <NA> African American Black or African American           <NA>     <NA>
## 763906 SRP303646      <NA> African American Black or African American           <NA>     <NA>
## 763907 SRP303646      <NA> African American Black or African American           <NA>     <NA>
## 763908 SRP303646      <NA> African American Black or African American           <NA>     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP324614
##         SRA.Study                 ETHNICITY RACE                 finalRace finalGeography hispanic
## 1098430 SRP324614                     White <NA>                      <NA>         Europe     <NA>
## 1098431 SRP324614                     White <NA>                      <NA>         Europe     <NA>
## 1098432 SRP324614                     White <NA>                      <NA>         Europe     <NA>
## 1098433 SRP324614                     White <NA>                      <NA>         Europe     <NA>
## 1098428 SRP324614 Black or African American <NA> Black or African American           <NA>     <NA>
## 1098429 SRP324614 Black or African American <NA> Black or African American           <NA>     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP363798
##         SRA.Study ETHNICITY RACE finalRace    finalGeography hispanic
## 1047027 SRP363798     Asian <NA>      <NA>              Asia     <NA>
## 1047028 SRP363798     Asian <NA>      <NA>              Asia     <NA>
## 1047031 SRP363798   African <NA>      <NA> Subsaharan Africa     <NA>
## 1047032 SRP363798   African <NA>      <NA> Subsaharan Africa     <NA>
## 1047033 SRP363798     Asian <NA>      <NA>              Asia     <NA>
## 1047034 SRP363798     Asian <NA>      <NA>              Asia     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP377781
##         SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 1164886 SRP377781     asian <NA>      <NA>           Asia     <NA>
## 1164891 SRP377781     mixed <NA>      <NA>       Multiple     <NA>
## 1164900 SRP377781     asian <NA>      <NA>           Asia     <NA>
## 1164860 SRP377781 caucasian <NA>     White           <NA>     <NA>
## 1164861 SRP377781 caucasian <NA>     White           <NA>     <NA>
## 1164862 SRP377781 caucasian <NA>     White           <NA>     <NA>
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP388678
##         SRA.Study       ETHNICITY RACE finalRace                finalGeography hispanic
## 1101373 SRP388678           Asian <NA>      <NA>                          Asia     <NA>
## 1101376 SRP388678 Mixed Ethnicity <NA>      <NA>                      Multiple     <NA>
## 1101379 SRP388678           Asian <NA>      <NA>                          Asia     <NA>
## 1101381 SRP388678           Asian <NA>      <NA>                          Asia     <NA>
## 1101384 SRP388678  Middle Eastern <NA>      <NA> North Africa and Western Asia     <NA>
## 1101387 SRP388678  Middle Eastern <NA>      <NA> North Africa and Western Asia     <NA>
by(allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,], allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,]$SRA.Study, function(x) table(x$RACE)) 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: DRP001797
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: ERP116722
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: ERP117085
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP063355
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP070663
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP072417
## 
## African American            Asian         Hispanic      South Asian            White   White.Hispanic 
##               20               36               14               19              250                2 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP073813
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP074739
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP102952
## 
##  Bruneian Caucasian   Chinese    Indian     Malay 
##         2         2       106        24        32 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP114762
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP116913
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP118614
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP125882
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP155483
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP172694
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP188296
## 
## African American African/American           Arabic            Asian        Caucasian  Native American 
##                1                1                1               15               12                3 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP190479
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212343
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212369
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212370
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP216558
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP219483
## 
## African American     Asian Indian            black        caucasian 
##                1                1                1                7 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP221484
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP226691
## 
## Caucasian  Japanese 
##         1        12 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP245400
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP251118
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP268711
## 
##         Asian (Nepali) Black/African American              Caucasian        Hispanic/Latino 
##                     18                     35                     52                     15 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP274641
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP283115
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP300191
## 
##   African American     Asian American Caucasian American    Eastern African 
##                 15                  1                 15                  1 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP303641
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP303646
## 
## African American            Asian       Casucasian       caucaisian         Hispanic           Indian            White 
##               23                1                1                1                3                1                7 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP324614
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP363798
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP377781
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP388678
## < table of extent 0 >
by(allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,], allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,]$SRA.Study, function(x) table(x$ETHNICITY)) 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: DRP001797
## 
## African american           Arabic        Caucasian         Hispanic         Japanese 
##                8                3               19                2               10 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: ERP116722
## 
##    Fulla     Jola Mandingo  Manjago    Other   Wollof 
##       14        7       15        2        2        4 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: ERP117085
## 
##          black mixed ancestry 
##             28            153 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP063355
## 
## asian black white 
##     2     5    11 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP070663
## 
## African-American            Asian  Hispanic-Latino 
##                3                6                3 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP072417
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP073813
## 
## African American            Asian        Caucasian            Other Pacific Islander 
##                6                6              260                6                3 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP074739
## 
## Africa American       Caucasian    South Indian 
##               5              67               5 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP102952
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP114762
## 
## Black White 
##     2    16 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP116913
## 
##   African heritage / african american     American indian or alaskan native  Asian - central/south asian heritage                                 Other 
##                                   264                                    14                                    14                                    41 
## White - caucasian / european heritage 
##                                   305 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP118614
## 
##  African Americans European Americans 
##                 16                 16 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP125882
## 
##     Black Caucasian     Latin  Oriental 
##        13        25         1         1 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP155483
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP172694
## 
##               African Asian-Pacificlslander             Caucasian              Hispanic 
##                     2                     2                     1                     1 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP188296
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP190479
## 
##            Asian        Caucasian   Middle Eastern Pacific Islander 
##                1               56                1                2 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212343
## 
##            Asian        Caucasian   Middle Eastern Pacific Islander 
##                1               54                1                2 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212369
## 
##            Asian        Caucasian   Middle Eastern Pacific Islander 
##                1               56                1                2 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212370
## 
##            Asian        Caucasian   Middle Eastern Pacific Islander 
##                1               56                1                2 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP216558
## 
##     Asian Caucasian    Indian 
##         1         1        12 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP219483
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP221484
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP226691
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP245400
## 
## African American            Asian        Caucasian         Hispanic  Native American 
##               94               38              188               23                2 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP251118
## 
##  AA EUR 
##  34  37 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP268711
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP274641
## 
##               African Asian-Pacificlslander             Caucasian              Hispanic 
##                     2                     2                     1                     1 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP283115
## 
##                  Asian          Asian/Pacific          Black/African Black/African American              Caucasian        Hispanic/Latino 
##                      2                      2                      1                      3                      6                      3 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP300191
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP303641
## 
##         Asian     Caucasian      Hispanic North african 
##             4            92             2             2 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP303646
## < table of extent 0 >
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP324614
## 
## Black or African American                     White 
##                         3                         4 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP363798
## 
##                         African                           Asian Asian or Asian British - Indian                     Bangladeshi                       Caribbean 
##                               2                              30                               2                               2                               2 
##                       Caucasian                         Chinese                        Jamaican 
##                              50                               2                               2 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP377781
## 
##     asian caucasian     mixed 
##         2        38         1 
## --------------------------------------------------------------------------------------------------------------------------------------- 
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP388678
## 
## African American/Black       Ashkenazi Jewish                  Asian              Caucasian               Hispanic         Middle Eastern        Mixed Ethnicity 
##                     42                     12                     66                    305                     30                      9                     26 
##                  Other 
##                      2

On the basis of that, studies with info originally in the race column: 1. SRP072417: Racial but distinguishes between South Asian and Asian (should just collapse.) 1. SRP102952: Caucasian when the rest is clearly from ISEA, should be geography. 1. SRP188296: Single Arabic individual 1. SRP219483: Single Asian Indian indivual, but distinguishes between African American and Black (sigh) 1. SRP226691: Single Caucasian and a lot of Japanese? 1. SRP268711: Asian (Nepali) should be race:Asian? 1. SRP300191: Eastern African distinct from African American 1. SRP303646: single Indian individual (and two ways of mispelling Caucasian)

And now the flip side (bolded are updated below): 1. DRP001797: Mixture of terms - Arabic, Japanese, and census terms. 1. ERP116722: Distinct African groups and ‘other’ which is getting parsed incorrectly. Should be all geography. 1. ERP117085: Black vs mixed geography; comes from UCL so would move 100% to ethnicity 1. SRP063355: Should be all race. 1. SRP070663: Should be all race. 1. SRP073813: Should be all race. 1. SRP074739: 5 South Indian inds, not clear. 1. SRP114762: Black and white… should keep it consistent not matter what. 1. SRP116913: Should be all race. 1. SRP118614: IDEK 1. SRP125882: IDEK 1. SRP172694, SRP274641: Should be race minus the hispanic sample - AsianPacific Islander is messing things up, should be Pacific islander. This one is annoying to fix, so doing it later 1. SRP190479, SRP212343, SRP212369, SRP212370: Middle Eastern keeping it from being race. 1. SRP216558: IDEK 1. SRP245400: Should be all race minus the hispanic. 1. SRP251118: Probably race. 1. SRP283115: AsianPacific should probably be Pacific Islander. This one is annoying to fix, so doing it later 1. SRP303641: North African otherwise race/hispanic. 1. SRP324614: Should be race. 1. SRP363798: Caucasian should go to Europe? 1. SRP377781: IDEK 1. SRP388678: IDEK

Some of the easier fixes, implemented here (should probably do it before, but let’s wait for a reply to my email first). Annoying to do it manually, but here we are.

# I don't trust the ordering to be maintained... 
allSRAFinal[allSRAFinal$SRA.Study %in% c("ERP116722", "SRP102952"), ]$finalGeography <- coalesce(allSRAFinal[allSRAFinal$SRA.Study %in% c("ERP116722", "SRP102952"), ]$finalGeography, allSRAFinal[allSRAFinal$SRA.Study %in% c("ERP116722", "SRP102952"), ]$finalRace)

allSRAFinal[allSRAFinal$SRA.Study %in% c("SRP063355", "SRP070663", "SRP073813", "SRP116913", "SRP245400", "SRP251118", "SRP324614", "SRP268711", "SRP072417"), ]$finalRace <- coalesce(allSRAFinal[allSRAFinal$SRA.Study %in% c("SRP063355", "SRP070663", "SRP073813", "SRP116913", "SRP245400", "SRP251118", "SRP324614", "SRP268711", "SRP072417"), ]$finalRace, allSRAFinal[allSRAFinal$SRA.Study %in% c("SRP063355", "SRP070663", "SRP073813", "SRP116913", "SRP245400", "SRP251118", "SRP324614", "SRP268711", "SRP072417"), ]$finalGeography)

# Seems like ordering is maintained, so...
table(allSRAFinal$finalGeography)
## 
##                      Americas                          Asia                     East Asia                        Europe                      Multiple North Africa and Western Asia 
##                           164                           304                          1333                          3281                           206                            27 
##                         Other                    South Asia                Southeast Asia             Subsaharan Africa                         White 
##                             3                           773                            83                           524                             2
table(allSRAFinal$finalRace)
## 
##         American Indian or Alaskan Native                                      Asia                                     Asian                 Black or African American 
##                                        83                                        52                                       658                                      2993 
##                                    Europe                                  Multiple Native Hawaiian or other Pacific Islander                                     Other 
##                                       346                                       244                                        19                                       278 
##                                South Asia                                     White 
##                                        51                                     12538
# and now we need to update some the terms that have changed:
allSRAFinal$finalGeography <- gsub("White", "Europe", allSRAFinal$finalGeography)
allSRAFinal$finalRace <- gsub("South Asia", "Asian", allSRAFinal$finalRace) %>% gsub("Europe", "White", .) %>% gsub("Asia$", "Asian", .) %>% gsub("Subsaharan Africa", "Black or African American", .)

# And set the other descriptor to NA:
allSRAFinal[allSRAFinal$SRA.Study %in% c("ERP116722", "SRP102952"), ]$finalRace <- NA
allSRAFinal[allSRAFinal$SRA.Study %in% c("SRP063355", "SRP070663", "SRP073813", "SRP116913", "SRP245400", "SRP251118", "SRP324614", "SRP268711", "SRP072417"), ]$finalGeography <- NA

# And now let's make the plot again to see if things have improved:
tabulatePops2 <- by(allSRAFinal, allSRAFinal$SRA.Study, function(x) table(x$finalGeography, x$finalRace)) 
conflictStudies2 <- names(tabulatePops2[lapply(tabulatePops2, function(x) grep("NULL", dimnames(x))) %>% grepl(0, .)])
length(conflictStudies2) 
## [1] 25
tabulateTerms2 <- by(allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies2,], allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies2,]$SRA.Study, function(x) table(is.na(x$finalGeography), is.na(x$finalRace))) # So there's often a clear skew towards one category, which should make solving these easier...

tabulateTerms2 <- data.frame(melt(unlist(tabulateTerms2)))
tabulateTerms2$condition <- rep(c("both", "race", "geography", "neither"), nrow(tabulateTerms2)/4)
tabulateTerms2$SRA.Study <- str_sub(rownames(tabulateTerms2), end=-2)

ggplot(tabulateTerms2, aes(x = SRA.Study, y = value, fill=condition)) +
  geom_bar(stat="identity") +
  theme_bw() +
  ggtitle("Race or Ethnicity usage") +
  # xlab("finalSystem") +
  ylab("Count") +
  scale_fill_brewer(palette = "Set1") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position="bottom")

Basic plots of population descriptors, now that we’re happy with that.

sampleGeography <- allSRAFinal %>% count(SRA.Study, finalGeography) %>% drop_na(finalGeography)
sampleRace <- allSRAFinal %>% count(SRA.Study, finalRace) %>% drop_na(finalRace)
meltSampleGeography <- melt(sampleGeography)
## Using SRA.Study, finalGeography as id variables
meltSampleRace <- melt(sampleRace)
## Using SRA.Study, finalRace as id variables
table(allSRAFinal$finalGeography)
## 
##                      Americas                          Asia                     East Asia                        Europe                      Multiple North Africa and Western Asia 
##                           164                           252                          1333                          2937                           206                            27 
##                         Other                    South Asia                Southeast Asia             Subsaharan Africa 
##                             3                           722                            83                           524
table(allSRAFinal$finalRace)
## 
##         American Indian or Alaskan Native                                     Asian                 Black or African American                                  Multiple 
##                                        83                                       761                                      2993                                       244 
## Native Hawaiian or other Pacific Islander                                     Other                                     White 
##                                        19                                       276                                     12882
sum(table(allSRAFinal$finalGeography))
## [1] 6251
sum(table(allSRAFinal$finalRace))
## [1] 17258
ggplot(meltSampleGeography, aes(x = finalGeography, y=value,fill=finalGeography)) +
  geom_boxplot(width=0.5, alpha=0.4) +
  geom_jitter(size=1, width=0.2) +
  theme_bw() +
  ggtitle("Geography across all studies") +
  # xlab("System") +
  ylab("Count") +
  scale_y_continuous(trans='log10') +
  scale_fill_viridis(discrete=T, na.value="grey50", option="plasma") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position = "none")

ggplot(meltSampleGeography, aes(x = finalGeography, y=value,fill=finalGeography)) +
  geom_bar(stat="identity", alpha=0.6) +
  theme_bw() +
  ggtitle("Geography across all studies") +
  # xlab("System") +
  ylab("Count") +
  # scale_y_continuous(trans='log10') +
  scale_fill_viridis(discrete=T, na.value="grey50", option="plasma") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position = "none")

ggplot(meltSampleRace, aes(x = finalRace, y=value,fill=finalRace)) +
  geom_boxplot(width=0.5, alpha=0.4) +
  geom_jitter(size=1, width=0.2) +
  theme_bw() +
  ggtitle("Race across all studies") +
  # xlab("System") +
  ylab("Count") +
  scale_y_continuous(trans='log10') +
  scale_fill_viridis(discrete=T, na.value="grey50") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position = "none")

ggplot(meltSampleRace, aes(x = finalRace, y=value,fill=finalRace)) +
  geom_bar(stat="identity", alpha=0.6) +
  theme_bw() +
  ggtitle("Race across all studies") +
  # xlab("System") +
  ylab("Count") +
  # scale_y_continuous(trans='log10') +
  scale_fill_viridis(discrete=T, na.value="grey50") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position = "none")

Yeah that looks good enough. At this point it might be interesting to see where the studies are coming from… but that is harder to parse than you might expect, so we’ll save that for later.

First, some summary statistics and plots:

# First we focus on population descriptors:
bigGeographySummary <- allSRAFinal %>% count(SRA.Study, finalGeography, finalSystem, finalOrgan)
bigRaceSummary <- allSRAFinal %>% count(SRA.Study, finalRace, finalSystem, finalOrgan)
GeographySummary <- allSRAFinal %>% count(finalGeography, finalSystem, finalOrgan)
raceSummary <- allSRAFinal %>% count(finalRace, finalSystem, finalOrgan)

meltGeography <- melt(GeographySummary)
## Using finalGeography, finalSystem, finalOrgan as id variables
meltRace <- melt(raceSummary)
## Using finalRace, finalSystem, finalOrgan as id variables
meltGeography %>% drop_na(c(finalSystem, finalGeography)) %>%
  ggplot(., aes(x = finalSystem, y=value, fill=finalGeography)) +
    geom_bar(stat="identity") +
    ggtitle("Biological System by Geography") +
    # xlab("finalSystem") +
    ylab("Count") +
    theme_bw() +
    scale_fill_viridis(discrete=T, na.value="grey50", option="plasma") +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
    theme(legend.position="bottom")

meltGeography %>% drop_na(c(finalOrgan, finalGeography)) %>%
  ggplot(., aes(x = finalOrgan, y= value, fill=finalGeography)) +
    geom_bar(stat="identity") +
    theme_bw() +
    ggtitle("Organ by Geography") +
    # xlab("finalOrgan") +
    ylab("Count") +
    scale_fill_viridis(discrete=T, na.value="grey50", option="plasma") +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
    theme(legend.position="bottom")

meltGeography %>% drop_na(c(finalOrgan, finalSystem, finalGeography)) %>%
ggplot(., aes(x = finalOrgan, y= value, color=finalGeography, fill=finalGeography)) +
  geom_jitter(size = 4, width=0.2) +
  theme_bw() +
  ggtitle("Organ by geography") +
  # xlab("Organ") +
  ylab("Count") +
  scale_y_continuous(trans='log10') +
  scale_color_viridis(discrete=T, na.value="grey50", option="plasma") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position = "bottom")

meltGeography %>% drop_na(c(finalSystem, finalGeography)) %>%
ggplot(., aes(x = finalSystem, y= value, color=finalGeography, fill=finalGeography)) +
  geom_jitter(size = 4, width=0.2) +
  theme_bw() +
  ggtitle("System by geography") +
  # xlab("System") +
  ylab("Count") +
  scale_y_continuous(trans='log10') +
  scale_color_viridis(discrete=T, na.value="grey50", option="plasma") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position = "bottom")

meltRace %>% drop_na(c(finalSystem, finalRace)) %>%
  ggplot(., aes(x = finalSystem, y= value, fill=finalRace)) +
    geom_bar(stat="identity") +
    theme_bw() +
    ggtitle("Biological System by Race") +
    # xlab("finalSystem") +
    ylab("Count") +
    scale_fill_viridis(discrete=T, na.value="grey50") +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
    theme(legend.position="bottom")

meltRace %>% drop_na(c(finalOrgan, finalRace)) %>%
  ggplot(., aes(x = finalOrgan, y= value, fill=finalRace)) +
    geom_bar(stat="identity") +
    theme_bw() +
    ggtitle("Organ by Race") +
    # xlab("finalSystem") +
    ylab("Count") +
    scale_fill_viridis(discrete=T, na.value="grey50") +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
    theme(legend.position="bottom")

meltRace %>% drop_na(c(finalOrgan, finalSystem, finalRace)) %>%
ggplot(., aes(x = finalOrgan, y= value, color=finalRace, fill=finalRace)) +
  geom_jitter(size = 4, width=0.2) +
  theme_bw() +
  ggtitle("Organ by race") +
  # xlab("Organ") +
  ylab("Count") +
  scale_y_continuous(trans='log10') +
  scale_color_viridis(discrete=T, na.value="grey50") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position = "bottom")

meltRace %>% drop_na(c(finalSystem, finalRace)) %>%
ggplot(., aes(x = finalSystem, y= value, color=finalRace, fill=finalRace)) +
  geom_jitter(size = 4, width=0.2) +
  theme_bw() +
  ggtitle("System by race") +
  # xlab("System") +
  ylab("Count") +
  scale_y_continuous(trans='log10') +
  scale_color_viridis(discrete=T, na.value="grey50") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position = "bottom")

We’re also interested in how diverse a given study is, and how many studies include samples from each descriptor. We can easily calculate all of that too, although it is hard to see visually.

geographyStudy <- allSRAFinal %>% count(SRA.Study, finalGeography) %>% drop_na(finalGeography)

# How many studies with any sort of Geography info?
dim(geographyStudy)
## [1] 136   3
length(unique(geographyStudy$SRA.Study))
## [1] 94
# And some quick stats on diversity by study:
geographyStudy %>% count(SRA.Study) %>% summary()
##   SRA.Study               n        
##  Length:94          Min.   :1.000  
##  Class :character   1st Qu.:1.000  
##  Mode  :character   Median :1.000  
##                     Mean   :1.447  
##                     3rd Qu.:1.750  
##                     Max.   :5.000
# But... how many samples?
ggplot(geographyStudy, aes(x = finalGeography, y=n, fill=SRA.Study)) +
  geom_bar(stat="identity") +
  ggtitle("Geography by Study") +
  ylab("Count") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position = "none") 

# This is hard to see, so:
ddply(geographyStudy, "finalGeography", summarise, totalN = sum(n), meanN=mean(n), maxN=max(n), nStudies=length(unique((SRA.Study))))
##                   finalGeography totalN     meanN maxN nStudies
## 1                       Americas    164  20.50000   39        8
## 2                           Asia    252  13.26316   66       19
## 3                      East Asia   1333  40.39394  208       33
## 4                         Europe   2937 122.37500  753       24
## 5                       Multiple    206  41.20000  153        5
## 6  North Africa and Western Asia     27   2.70000    9       10
## 7                          Other      3   1.50000    2        2
## 8                     South Asia    722  48.13333  365       15
## 9                 Southeast Asia     83  27.66667   48        3
## 10             Subsaharan Africa    524  30.82353  129       17
# Now adding the finalOrgan dimension
geographyStudyfinalOrgan <- allSRAFinal %>% count(SRA.Study, finalGeography, finalOrgan) %>% drop_na(finalGeography, finalOrgan)

ggplot(geographyStudyfinalOrgan, aes(x = finalGeography, y=n, fill=SRA.Study)) +
  geom_bar(stat="identity") +
  facet_wrap( ~ finalOrgan, nrow=3) +
  ggtitle("Geography by study by organ") +
  ylab("Count") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position = "none") 

ggplot(geographyStudyfinalOrgan, aes(x = finalOrgan, y=n, fill=SRA.Study)) +
  geom_bar(stat="identity") +
  facet_wrap( ~ finalGeography, nrow=3) +
  ggtitle("Geography by study by organ") +
  ylab("Count") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position = "none") 

# And now really broken down:
ddply(geographyStudyfinalOrgan, c("finalGeography", "finalOrgan"), summarise, totalN = sum(n), meanN=mean(n), maxN=max(n), nStudies=length(unique((SRA.Study))))
##                   finalGeography   finalOrgan totalN      meanN maxN nStudies
## 1                       Americas        blood     37  18.500000   36        2
## 2                       Americas  bone marrow     27  27.000000   27        1
## 3                       Americas        brain     15  15.000000   15        1
## 4                       Americas         iPSC     42  21.000000   24        2
## 5                       Americas        joint      4   4.000000    4        1
## 6                       Americas       muscle     39  39.000000   39        1
## 7                           Asia        blood     69  23.000000   66        3
## 8                           Asia        brain      4   1.000000    1        4
## 9                           Asia        heart      4   4.000000    4        1
## 10                          Asia    intestine      5   5.000000    5        1
## 11                          Asia         iPSC     72  24.000000   39        3
## 12                          Asia        joint     30  30.000000   30        1
## 13                          Asia        liver     48  16.000000   34        3
## 14                          Asia         lung      4   4.000000    4        1
## 15                          Asia       testis      1   1.000000    1        1
## 16                          Asia      thyroid     15  15.000000   15        1
## 17                     East Asia      bladder     10  10.000000   10        1
## 18                     East Asia   blastoderm     22  22.000000   22        1
## 19                     East Asia        blood    497  35.500000  182       14
## 20                     East Asia blood vessel     30  15.000000   20        2
## 21                     East Asia         bone      4   4.000000    4        1
## 22                     East Asia  bone marrow     63  21.000000   23        3
## 23                     East Asia       cancer     15  15.000000   15        1
## 24                     East Asia        heart    106 106.000000  106        1
## 25                     East Asia    intestine    361 120.333333  208        3
## 26                     East Asia         iPSC     12  12.000000   12        1
## 27                     East Asia        joint      2   2.000000    2        1
## 28                     East Asia       kidney     20  20.000000   20        1
## 29                     East Asia        liver     50  25.000000   32        2
## 30                     East Asia       morula     41  41.000000   41        1
## 31                     East Asia       muscle     40  40.000000   40        1
## 32                     East Asia         skin     60  60.000000   60        1
## 33                        Europe        blood    969 107.666667  375        9
## 34                        Europe       breast     23  23.000000   23        1
## 35                        Europe        heart     72  24.000000   46        3
## 36                        Europe    intestine    258 129.000000  210        2
## 37                        Europe         iPSC    884 176.800000  330        5
## 38                        Europe       muscle    687 343.500000  671        2
## 39                        Europe     prostate     16  16.000000   16        1
## 40                        Europe         skin     24  24.000000   24        1
## 41                      Multiple        blood    180  60.000000  153        3
## 42                      Multiple    intestine     26  13.000000   25        2
## 43 North Africa and Western Asia        blood     17   5.666667    9        3
## 44 North Africa and Western Asia blood vessel      4   2.000000    3        2
## 45 North Africa and Western Asia        brain      4   1.000000    1        4
## 46 North Africa and Western Asia         lung      2   2.000000    2        1
## 47                         Other        blood      2   2.000000    2        1
## 48                         Other        heart      1   1.000000    1        1
## 49                    South Asia        blood    672  84.000000  365        8
## 50                    South Asia        heart     26  13.000000   24        2
## 51                    South Asia    intestine      5   5.000000    5        1
## 52                    South Asia         iPSC      2   2.000000    2        1
## 53                    South Asia        joint      4   4.000000    4        1
## 54                    South Asia         skin      1   1.000000    1        1
## 55                    South Asia       testis     12  12.000000   12        1
## 56                Southeast Asia        blood     49  24.500000   48        2
## 57                Southeast Asia        heart     34  34.000000   34        1
## 58             Subsaharan Africa        blood    507  42.250000  129       12
## 59             Subsaharan Africa        heart      1   1.000000    1        1
## 60             Subsaharan Africa    intestine     10  10.000000   10        1
## 61             Subsaharan Africa        joint      2   2.000000    2        1
## 62             Subsaharan Africa         lung      4   2.000000    2        2

These make sense! The Singaporean cohorts will be the three main Geography: (South) Chinese, Malay and Tamil; Ambry is of course invested in diversity, TB is unlikely to show up in Europe. Also I am willing to bet any amount of money that the digestive Asian sequencing comes from cancer samples too somehow?

Anyhow, now we do the same for race:

raceStudy <- allSRAFinal %>% count(SRA.Study, finalRace) %>% drop_na(finalRace)

# How many studies with any sort of Race info?
dim(raceStudy)
## [1] 368   3
length(unique(raceStudy$SRA.Study))
## [1] 190
# And some quick stats on diversity by study:
raceStudy %>% count(SRA.Study) %>% summary()
##   SRA.Study               n        
##  Length:190         Min.   :1.000  
##  Class :character   1st Qu.:1.000  
##  Mode  :character   Median :2.000  
##                     Mean   :1.937  
##                     3rd Qu.:2.000  
##                     Max.   :7.000
# But... how many samples?
ggplot(raceStudy, aes(x = finalRace, y=n, fill=SRA.Study)) +
  geom_bar(stat="identity") +
  ggtitle("Race by Study") +
  ylab("Count") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position = "none") 

# This is hard to see, so:
ddply(raceStudy, "finalRace", summarise, totalN = sum(n), meanN=mean(n), maxN=max(n), nStudies=length(unique((SRA.Study))))
##                                   finalRace totalN     meanN maxN nStudies
## 1         American Indian or Alaskan Native     83  6.384615   40       13
## 2                                     Asian    761 17.697674  112       43
## 3                 Black or African American   2993 29.930000  355      100
## 4                                  Multiple    244 13.555556   63       18
## 5 Native Hawaiian or other Pacific Islander     19  2.375000    5        8
## 6                                     Other    276 17.250000   64       16
## 7                                     White  12882 75.776471 1337      170
# Now adding the finalOrgan dimension
raceStudyfinalOrgan <- allSRAFinal %>% count(SRA.Study, finalRace, finalOrgan) %>% drop_na(finalRace, finalOrgan)

ggplot(raceStudyfinalOrgan, aes(x = finalRace, y=n, fill=SRA.Study)) +
  geom_bar(stat="identity") +
  facet_wrap( ~ finalOrgan, nrow=3) +
  ggtitle("Race by study by organ") +
  ylab("Count") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position = "none") 

ggplot(raceStudyfinalOrgan, aes(x = finalOrgan, y=n, fill=SRA.Study)) +
  geom_bar(stat="identity") +
  facet_wrap( ~ finalRace, nrow=3) +
  ggtitle("Race by study by organ") +
  ylab("Count") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position = "none") 

# And now really broken down:
ddply(raceStudyfinalOrgan, c("finalRace", "finalOrgan"), summarise, totalN = sum(n), meanN=mean(n), maxN=max(n), nStudies=length(unique((SRA.Study))))
##                                     finalRace      finalOrgan totalN      meanN maxN nStudies
## 1           American Indian or Alaskan Native           blood     70  10.000000   40        7
## 2           American Indian or Alaskan Native    blood vessel      2   2.000000    2        1
## 3           American Indian or Alaskan Native          breast      2   1.000000    1        2
## 4           American Indian or Alaskan Native          cancer      5   2.500000    3        2
## 5           American Indian or Alaskan Native           heart      1   1.000000    1        1
## 6           American Indian or Alaskan Native            nose      3   3.000000    3        1
## 7                                       Asian           blood    544  22.666667  112       24
## 8                                       Asian    blood vessel     15  15.000000   15        1
## 9                                       Asian     bone marrow      1   1.000000    1        1
## 10                                      Asian           brain      6   6.000000    6        1
## 11                                      Asian          breast     47  15.666667   36        3
## 12                                      Asian          cancer     13  13.000000   13        1
## 13                                      Asian           heart      1   1.000000    1        1
## 14                                      Asian       intestine      9   9.000000    9        1
## 15                                      Asian            iPSC     69  17.250000   55        4
## 16                                      Asian            nose      3   3.000000    3        1
## 17                                      Asian     oral cavity     12  12.000000   12        1
## 18                                      Asian           ovary     20   6.666667   11        3
## 19                                      Asian             PNS      9   9.000000    9        1
## 20                                      Asian            skin      2   2.000000    2        1
## 21                                      Asian          uterus     10  10.000000   10        1
## 22                  Black or African American         adipose     22  22.000000   22        1
## 23                  Black or African American           blood   1815  41.250000  355       44
## 24                  Black or African American    blood vessel      9   4.500000    8        2
## 25                  Black or African American     bone marrow     12   6.000000    9        2
## 26                  Black or African American           brain    278  25.272727   87       11
## 27                  Black or African American          breast    126  25.200000   41        5
## 28                  Black or African American          cancer     59  14.750000   53        4
## 29                  Black or African American             CNS      2   2.000000    2        1
## 30                  Black or African American           heart    177  35.400000  124        5
## 31                  Black or African American       intestine    173  43.250000  135        4
## 32                  Black or African American            iPSC     22  11.000000   20        2
## 33                  Black or African American          kidney      1   1.000000    1        1
## 34                  Black or African American           liver     17   8.500000   16        2
## 35                  Black or African American            lung     46  15.333333   36        3
## 36                  Black or African American      lymph node     20  20.000000   20        1
## 37                  Black or African American          muscle     13  13.000000   13        1
## 38                  Black or African American            nose     29  14.500000   26        2
## 39                  Black or African American     oral cavity     19   9.500000   18        2
## 40                  Black or African American           ovary      4   2.000000    3        2
## 41                  Black or African American pituitary gland      3   3.000000    3        1
## 42                  Black or African American        placenta     32  32.000000   32        1
## 43                  Black or African American             PNS      3   3.000000    3        1
## 44                  Black or African American        prostate     48  24.000000   32        2
## 45                  Black or African American            skin     27   5.400000   13        5
## 46                  Black or African American          tonsil      1   1.000000    1        1
## 47                  Black or African American   urinary tract      4   4.000000    4        1
## 48                  Black or African American          uterus     21  10.500000   19        2
## 49                  Black or African American          vagina      8   8.000000    8        1
## 50                                   Multiple         adipose      1   1.000000    1        1
## 51                                   Multiple         bladder      1   1.000000    1        1
## 52                                   Multiple           blood    212  21.200000   63       10
## 53                                   Multiple          cancer     12   4.000000    5        3
## 54                                   Multiple           heart      2   2.000000    2        1
## 55                                   Multiple       intestine      3   1.500000    2        2
## 56                                   Multiple            lung      5   1.666667    2        3
## 57                                   Multiple          muscle      1   1.000000    1        1
## 58                                   Multiple            nose      4   4.000000    4        1
## 59                                   Multiple          spleen      1   1.000000    1        1
## 60                                   Multiple         stomach      1   1.000000    1        1
## 61                                   Multiple          thymus      1   1.000000    1        1
## 62  Native Hawaiian or other Pacific Islander           blood      8   2.666667    5        3
## 63  Native Hawaiian or other Pacific Islander           brain     11   2.200000    3        5
## 64                                      Other           blood    263  21.916667   64       12
## 65                                      Other           brain      6   6.000000    6        1
## 66                                      Other          cancer      1   1.000000    1        1
## 67                                      Other            nose      3   3.000000    3        1
## 68                                      Other           ovary      3   3.000000    3        1
## 69                                      White         adipose     84  28.000000   54        3
## 70                                      White   adrenal gland      3   1.500000    2        2
## 71                                      White         bladder      1   1.000000    1        1
## 72                                      White           blood   7916 134.169492 1337       59
## 73                                      White    blood vessel    115  16.428571   48        7
## 74                                      White     bone marrow     50  12.500000   22        4
## 75                                      White           brain   1357  61.681818  260       22
## 76                                      White          breast    300  37.500000  148        8
## 77                                      White          cancer    620 124.000000  501        5
## 78                                      White       cartilage      1   1.000000    1        1
## 79                                      White             CNS     13  13.000000   13        1
## 80                                      White digestive tract      2   2.000000    2        1
## 81                                      White             eye     47  23.500000   31        2
## 82                                      White           heart    349  34.900000  237       10
## 83                                      White       intestine    166  27.666667   67        6
## 84                                      White            iPSC    502  45.636364  252       11
## 85                                      White           joint     50  50.000000   50        1
## 86                                      White          kidney      6   2.000000    3        3
## 87                                      White          larynx      1   1.000000    1        1
## 88                                      White           liver    192  96.000000  191        2
## 89                                      White            lung    241  20.083333   92       12
## 90                                      White      lymph node      2   2.000000    2        1
## 91                                      White          muscle     42  21.000000   40        2
## 92                                      White            nose     72  18.000000   29        4
## 93                                      White     oral cavity     46  23.000000   30        2
## 94                                      White           ovary    194  32.333333  115        6
## 95                                      White        pancreas      2   2.000000    2        1
## 96                                      White pituitary gland      4   4.000000    4        1
## 97                                      White        prostate    143  47.666667   94        3
## 98                                      White            skin    160  14.545455   48       11
## 99                                      White          spleen      2   2.000000    2        1
## 100                                     White         stomach      3   1.500000    2        2
## 101                                     White          testis      3   1.000000    1        3
## 102                                     White         thyroid     20  10.000000   19        2
## 103                                     White          tonsil      5   5.000000    5        1
## 104                                     White         trachea     12  12.000000   12        1
## 105                                     White   urinary tract     47  47.000000   47        1
## 106                                     White          uterus     19   6.333333   16        3
## 107                                     White          vagina      8   8.000000    8        1

Wowowowow the stark difference. Would be nice to slice this by country of sampling to see if this is really driven by the USA, or if random people are using race terms…